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I. INTRODUCTION 



The essence of any scientific progress is the comparison of theoretical predictions to 
experimental data. Statistics provides the scientist with so-called goodness- of- fit tests, which 
allow to obtain well defined probability statements about the agreement of a theory with 
data. The by far most popular goodness-of-fit test dates back to 1900, when K. Pearson 
identified the minimum of a \ 2 -function as a powerful tool to evaluate the quality of the 
fit 0. However, it is known that the Pearson Xmm test * s n °t very restrictive in global 
analyses, where data from different experiments with a large number of data points are 
compared to a theory depending on many parameters. The reason for this is that in such a 
case a given parameter is often constrained only by a small subset of the data. If the rest of 
the data (which can contain many data points) are reasonably fitted, a possible problem in 
the fit of the given parameter is completely washed out by the large amount of data points. 
A discussion of this problem in various contexts can be found e.g. in Refs. 1, 

To evade this problem a modification of the original Xmm test was proposed in Ref. 0] to 
evaluate the goodness-of-fit of neutrino oscillation data in the framework of four-neutrino 
models. There this method was called parameter goodness-of-fit (PG), and it can be applied 
when the global data consists of statistically independent subsets. The PG is based on 
parameter estimation and hence it avoids the problem of being diluted by many data points. 
It tests the compatibility of the different data sets in the framework of the given theoretical 
model. In this note we give a formal derivation of the probability distribution function 
(p.d.f.) for the test statistic of the PG, and discuss the application and interpretation 
of the PG on some examples. The original motivation for the PG was the analysis of 
neutrino oscillation data. However, the method may be very useful also in other fields of 
physics, especially where global fits of many parameters to data from several experiments 
are performed. 

The outline of the paper is as follows. In Sec. |H] we define the PG and show that its 
construction is very similar to the one of the standard goodness-of-fit. The formal derivation 
of the p.d.f. for the PG test statistic is given in Sec. IHIl whereas in Sec. |lV]a discussion of 
the application and interpretation of the PG is presented. In Sec. |V]we consider the PG in 
the case of correlations due to theoretical errors, and we conclude in Sec. IVII 



II. GOODNESS-OF-FIT TESTS 

We would like to start the discussion by citing the goodness-of-fit definition given by the 
Particle Data Group (see Sec. 31.3.2. of Ref. pj): "Often one wants to quantify the level 
of agreement between the data and a hypothesis without explicit reference to alternative 
hypotheses. This can be done by defining a goodness-of-fit statistic, t, which is a function of 
the data whose value reflects in some way the level of agreement between the data and the 
hypothesis. [. . . ] The hypothesis in question, say, H Q will determine the p.d.f. g(t\H ) for the 
statistic. The goodness-of-fit is quantified by giving the p-value, defined as the probability 
to find t in the region of equal or lesser compatibility with H than the level of compatibility 
observed with the actual data. For example, if t is defined such that large values correspond 
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to poor agreement with the hypothesis, then the p-value would be 

POO 

p= g(t\H Q )dt, (1) 

where t Q bs is the value of the statistic obtained in the actual experiment." 

Let us stress that from this definition of goodness-of-fit one has complete freedom in 
choosing a test statistic t, as long as the correct p.d.f. for it is used. 



A. The standard goodness-of-fit 



Consider N random observables u = {i>j) and let fii{0) denote the expectation value for 
the observable z/j, where = (6 a ) are P independent parameters which we wish to estimate 
from the data. Assuming that the covariance matrix S is known one can construct the 
following x 2 -function: 

j?(0) = [u-rtO)] T £r l [u-i*(0)] (2) 
and use its minimum Xmia as test statistic for goodness-of-fit evaluation: 

^) = Xmin- (3) 

The hypothesis we want to test determines the p.d.f. g(t) for this statistics. Once the real 
experiments have been performed, giving the results is bs, the goodness-of-fit is given by the 
probability of obtaining a t larger than t obs , as expressed by Eq. (JTJ). We will refer to this 
procedure as standard goodness-of-fit (SG): 

POO 

Psg= / g(t)dt. (4) 

The great success of this method is mostly due to a very powerful theorem, which was 
proven over 100 years ago by K. Pearson 1 |l| and which greatly simplifies the task of cal- 
culating the integral in Eq. (jlj) . It can be shown under quite general conditions (see e.g. 
Ref. 0) that Xmin follows a ^-distribution with N — P degrees of freedom (d.o.f.), so that 
g(t) = f x z(t, N — P). Therefore, the integral in Eq. (J1J) becomes: 

POO 

Psg = CL( x ^> obs ), N — P) = / f x2 (t, N — P) dt, (5) 

where CL(x 2 , n ) is the confidence level function (see e.g. Fig. 31.1 of Ref. p). 

In the following we propose a modification of the SG, for the case when the data can be 
divided into several statistically independent subsets. 



1 Pearson uses the slightly different test statistic 



2 v^N-MiW] 1 

XPe arson / > 



and assumes that the v\ are independent. We prefer to use instead the x 2 of Eq. (J2J, because in this way 
also correlated data can be considered. 
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B. The parameter goodness-of-fit 



Consider D statistically independent sets of random observables u r = (r = 1, . . . , D), 
each consisting of N r observables (i = l,...,N r ), with i\T to t = ^2 r N r . Now a theory- 
depending on P parameters = (8 a ) is confronted with the data. The total x 2 is given by 



D 

x£*(«) = £*?(*), (6) 

r=l 

where 

x*(o) = y-S(o)] T s; 1 y-S(o)] (7) 

is the x 2 °f the data set r. Now we define 

D 

X 2 (0)=xL(0)-J2XrMn> ( 8 ) 

r=l 

where X 2 rm m = Xr(@r), an d r (u r ) are the values of the parameters which minimize x 2 - 
Instead of the total x 2 -minimum we propose now to use 

t(»)=xLn = X 2 (d) (9) 

as test statistic for goodness-of-fit evaluation. In Eq. Q x min is the minimum of x 2 defined 
in Eq. (|Sjl. and 6 are the parameter values at the minimum of x 2 ; or equivalently of Xtot- ^ 
we now denote by g(t) the p.d.f. for this statistic, we can define the corresponding goodness- 
of-fit by means of Eq. (JTJ), in complete analogy to the SG case: 

poo 

Ppg= / 3(f) dt. (10) 

"'xiin( l/ ob S ) 



This procedure was proposed in Ref. [5( with the name parameter goodness-of-fit (PG). Its 
construction is very similar to the SG, except that now x 2 rather than x 2 is used to define 
the test statistic. 

In the next section we will show that also in the case of the PG the calculation of the 
integral appearing in Eq. (fTU|) can be greatly simplified. Let us define 



P r = rank 



<9/x r 

Ho 



(ii) 



This corresponds to the number of independent parameters (or parameter combinations), 
constrained by a measurement of \i r ? Then under general condition xLm * s distributed as 
a x 2 with P c = J2 r Pr ~ P d.o.f., so that Eq. (fTT)|) reduces to: 

p PG = CL(x mi >obs), Pc). (12) 



2 If in some pathological cases P r depends on the point in the parameter space Eq. I jllj l should be evaluated 
at the true values of the parameters, see Sec. IIII Bl 
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III. THE PROBABILITY DISTRIBUTION FUNCTION OF x 



2 

min 



In this section we derive the distribution of the test statistic for the PG. This can be done 
in complete analogy to the SG. Therefore, we start by reviewing the corresponding proof for 
the SG, see e.g. Ref. 0- 

A. The standard goodness-of-fit 

Let us start from the x 2 defined in Eq. (J2J). Since the covariance matrix S is a real, 
positive and symmetric matrix one can always find an orthogonal matrix O and a diagonal 
matrix s such that S _1 = T s 2 0. Hence, we can write the x 2 m the following way: 

x 2 {6) = [u- »{0)\ T s-> - /*(*)] = y(0f y(0) , (is) 

where we have defined the new variables y(0) = sO[u — //(#)]. Let us denote the (unknown) 
true values of the parameters by 0° and we define 

x = y(0°) = a O[i/-/i(0 )]. (14) 

Now we assume that the X{ are normal distributed with mean zero and the covariance matrix 
In, which in particular implies that they are statistically independent. This assumption is 
obviously correct if the data i/j are normal distributed with mean fii(0°) and covariance 
matrix S. However, it can be shown (see e.g. Refs. IE 0) that this assumption holds 
for a large class of arbitrary p.d.f. for the data under quite general conditions, especially in 
the large sample limit, i.e. large Vi. Under this assumption it is evident that x 2 (^°) = xTx 
follows a ^-distribution with N d.o.f.. According to Eq. Q the test statistic t for the 
SG is given by the minimum of Eq. (|13j). To derive the p.d.f. for t we state the following 
proposition: 

Proposition 1 Let be the values of the parameters which minimize Eq. pHj) . Then 

xl m = X 2 (0°)-A X 2 , (15) 

with Xmm = y T y an d y = y(@)> has a x 2 -distribution with N — P d.o.f. and Ax 2 has a 
X 2 - distribution with P d.o.f. and is statistically independent o/Xmm- 

A rigorous proof of this proposition is somewhat intricate and can be found e.g. in Ref. 0. 
In the following we give an outline of the proof dispensing with mathematical details for the 
sake of clarity. 

The are obtained by solving the equations 
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It can be proved (see e.g. Ref. jjfl) under very general conditions that Eqs. (fTtjj) have a unique 
solution which converges to the true values 0° in the large sample limit. In this sense it 
is a good approximation 3 to write 



y^x + 5(0-0°), 

where we have defined the rectangular N x P matrix B by 

dy 



B = 



86 



0° 



(17) 



(18) 



With out loss of generality we assume that 4 rank [I? ] = P. From Eq. (JT7J) we obtain 

B. (19) 



<9y 
89 







8y_ 
86 



0° 



Using this last relation in Eq. (fTBj) we find that y fulfils y T B = 0. Multiplying Eq. (fTTj) 
from the left side by B T this leads to 



£? T x = -B T B(0 - 0°) . 
Using Eqs. (fTTj) and (J20|) we obtain 

y T y = x r x - (0 - 0°) T B T B(0 - 0°) . 



(20) 



(21) 



The symmetric P x P matrix B T B can be written as B T B = Rb 2 R T with the orthogonal 
matrix R and the diagonal matrix 6, and Eq. ()2())1 implies b~ 1 R T B T x. = —bR T {6 — 0°). 
Defining the N x P matrix 

H = BRb' 1 (22) 
we find (0 - 6°) T B T B(6 - 0°) = x T M T x, and Eq. (JHJ) becomes 



y T y = x 1 \l N -HH T ) X . 



(23) 



Note that the matrix H obeys the orthogonality relation H T H = lp, showing that the P 
column vectors of length N in H are orthogonal. We can add N — P columns to the matrix 
H completing it to an orthogonal N x N matrix: V = (H, K). Here K is an iV x (N — P) 
matrix with K T K = l(jv-f), H T K = and the completeness relation 



VV T = HH T + KK T 



(24) 



3 Note that Eq. I|17l) is exact if the y depend linearly on the parameters 0. 

4 If rank[B] = P' < P some of the parameters 8 a are not independent. In this case one can perform a 
change of variables and choose a new sets of parameters 9'p, such that x 2 (0') depends only on the first 
P' of them. The remaining parameters are not relevant for the problem and can be eliminated from the 
very beginning. When repeating the construction in the new set of variables, the number of parameters 
will be equal to the rank of B. 
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Now we transform to the new variables 



(;) = (£*), (25) 

where v = if T x is a vector of length P and w = K T x is a vector of length N — P. In 
general, if the covariance matrix of the random variables x is S, then the covariance matrix 
S' of x' = y T x is given by S' = V T SV. Hence, since in the present case the Xi are normal 
distributed with mean and covariance matrix ljv the same is true for the x\. In particular 
also v and w are statistically independent. Using Eqs. (J2*3|l and (J2*3j) we deduce 

y T y = x T (l iV - HH T )x = x T KK T x = w T w (26) 

proving that xLin = y T y nas a ^-distribution with N — P d.o.f.. Finally, we obtain 

Ax 2 = X 2 (0°) ~ xLn = x T x - y T y = k t HH t k = v r v , (27) 

showing that A% 2 has a ^-distribution with P d.o.f. and is statistically independent of Xmin- 
□ 



B. The parameter goodness-of-fit 

Moving now to the PG we generalize in an obvious way the formalism of the previous 
section by attaching and index r for the data set to each quantity. We have 

xL(0) = E^(0)yr(0), xL(o°) = J2^r, (28) 

r r 

and 

x\e) = xUo) - = E [yrWyrW - yJy,] • (29) 

r r 

Proposition 2 Let 6 be the values of the parameters which minimize x 2 (0), or equivalently 
Xt ot (6)- Then xLm = X 2 (6) follows a x 2 -distribution with P c d.o.f., with 



P C = V-P, V = ^2P ri P r = Tank[B r ] and B r ~ (>y 



dO 

r=l 



(30) 

0° 



The matrices B r are of order N r x P. Since a given data set r may depend only on some 
of the P parameters, or on some combination of them, in general one has to consider the 
possibility of P r < P. 5 This means that the symmetric P x P matrix B^B r can be writen as 
R r bJb r Rj , where R r is an orthogonal matrix and b r is a P r x P "diagonal" matrix, such that 
the diagonal P x P matrix bj.b r will have P r non-zero entries. Let us now define the P x P r 
"diagonal" matrix b~ l in such a way that (fejT 1 )^ = l/{b r )a for each of the P r non-vanishing 



Note that the definition of P r in Eq. (|30|l is equivalent to the one given in Eq. . 
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entries of b r , and all other elements are zero. In analogy to Eq. 
matrices 

H r = B r R r b r 1 , 



we introduce now the 
(31) 



which are of order N r x P r . To prove Proposition^ we define the vectors of length V 









( H^i \ 




(vi \ 


Y(B) = 




, x = 










\H T D y D {6)) 








\vd) 



(32) 



In the first part of the proof we show that Xmm = * Y with Y = Y(0). With arguments 



similar to the ones leading to Eq. we find 

E = E x ^ - (* - 0O ) T E - e °) 



(33) 



Using further Eq. 



for each r we obtain 



x 2 - 

Amin 



E^yr 



E^ 



E xZHrHfxr - (0 - 0°) T E S ^r(0 " 00 



(34) 



On the other hand we can use that the minimum values 6 are converging to the true values 
6° in the large sample limit and write Y«X + B(0 — 0°), where we have defined the V x P 
matrix 



B 



dY 






~d0 


0° 


K H dBd ) 



(35) 



Without loss of generality we assume that rank[£>] = P. Again, with arguments similar to 
the ones leading to Eq. (J2T|) we derive 



Y Y = X T X - (6 - 6 ) B B(8 - 



(36) 



Using Eq. (|3T|l it is easy to show that B T B = ^2 r B>jB r , and by comparing Eqs. and 
we can readily verify the relation xLm — Y T Y. 
To complete the proof we identify Y «-> y and X <-> x and proceed in perfect analogy to 
the proof of Proposition ^ given in Sec. IIII Al In particular, from the arguments presented 
there it follows that the elements of v r are P r independent Gaussian variables with mean 
zero and variance one. Since the D data sets are assumed to be statistically independent 
the vector X contains V independent Gaussian variables with mean zero and variance one. 
In analogy to the matrices H, K of Sec. IIII Al we obtain now the V x P matrix Ti and the 
V x P c matrix /C, which fulfil HTi 7 + /C/C T = 1-p, and Eq. (f3lj|) becomes 



Y T Y 



X T (1^ 



(37) 
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In analogy to the vector w from Eq. (|25|) we define now W = £ T X, containing P c = V - P 
independent Gaussian variables with mean zero and variance one, and Eq. ()37|) gives 

Y T Y = X T /C/C T X = W r W . (38) 

From Eq. (J3~%j) it is evident that xLm = Y T Y follows a ^-distribution with P c d.o.f.. □ 
Let us conclude this section by noting that both Proposition^ and El are exact if the data 
are multi-normally distributed and the theoretical predictions /x, fi r depend linearly on the 
parameters 6. If these requirements are not fulfilled the simplified expressions (0) and ()12j) 
are valid only approximately, and to calculate the SG and the PG one should in principle 
use the general formulas and (jlOJ) instead. However, we want to stress that under rather 
general conditions xLin an d Xmin wm be distributed as a \ 2 ^ n the large sample limit (i.e. 
for large u and u r , respectively), so that even in the general case Eqs. (0) and (|T2"|) can still 
be used. 



IV. EXAMPLES AND DISCUSSION 

In this section we illustrate the application of the PG on some examples. In Sec. II V Al we 
show that in the simple case of two measurements of a single parameter the PG is identical to 
the intuitive method of considering the difference of the two measurements, and in Sec. IIVBI 
we show the consistency of the PG and the SG in the case of independent data points. In 
Sec. HVGI we discuss the application of the PG to neutrino oscillation data in the framework 
of a sterile neutrino scheme. This problem was the original motivation to introduce the PG 
in Ref. fl. In Sec. IPTPl we add some general remarks on the PG. 



A. The determination of one parameter by two experiments 

Let us consider two data sets observing the data points v l = {y}) (i — 1, . . . ,Nx) and 
v 2 = (i/f) (i = 1, . . . , A^). Further, we assume that the expectation values for both data 
sets can be calculated from a theory depending on one parameter r\: n r {j}) (r = 1,2), and 
all v\ are independent and normal distributed around the expectation values with variance 
a\. Then we have the following x 2 -functions for the two data sets r = 1,2: 

= £ (^^p) 2 = xU + (^) 2 . (39) 

where f] r = fj r (u r ) is the value of the parameter at the x 2 -minimum of data set r. Now 
one may ask the question whether the results of the two experiments are consistent. More 
precisely, we are interested in the probability to obtain fji and f} 2 under the assumption that 
both result from the same true value rf. 

A standard method (see e.g. Ref. [8J] Sec. 14.3) to answer this question is to consider the 
variable 

V 0i + o l 2 
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If the theory is correct z is normal distributed with mean zero and variance one. Hence we 
can answer the question raised above by citing the probability to obtain \z\> \z \, B \: 

/kobs 
f N (z;0,l)dz, (41) 
-kobsl 

where fa denotes the normal distribution. 

If the PG is applied to this problem, one obtains from Eq. (jHHJ) 



™>n^)H*;" (42) 

and after some simple algebra one finds x min = ^ i where z is given in Eq. (|40jh Obviously, 
applying Eq. (|12j) to calculated the p-value according to the PG with the relevant number 
of d.o.f. P c = 2 — 1 = 1 leads to the same result as Eq. (j4~Tlh 

Hence, we arrive at the conclusion that in this simple case of testing the compatibility of 
two measurements for the mean of a Gaussian, the PG is identical to the intuitive method 
of testing whether the difference of the two values is consistent with zero. 



B. Consistency of PG and SG for independent data points 

As a further example of the consistency of the PG method we consider the case of iV 
statistically independent data points z/j. Let us denote by a, the standard deviation of the 
observation z/, (i — 1, . . . , N), and the corresponding theoretical prediction by fii(0), where 
6 is the vector of P parameters. For simplicity, we assume that each of the \x\ depends at 
least on one parameter. Then the x 2 is given by 

X\0) =f>*(0) , where tf(0) = [Ui ~ ^ , (43) 

and from the SG construction (see Sec. IIII A|) we know that x m in follows a ^-distribution 
with N — P degrees of freedom. On the other hand, if we consider each single data point 
as an independent data set and we apply the PG construction, we easily see that xt min = 
for each i. This implies x 2 (#) = X 2 (#), and in particular xLm = Xnan- Therefore, for the 
specific case considered here one expects that SG and PG are identical. 

To show that this is really the case let us first note that each matrix dfii/dO consists 
just of a single line, and therefore it obviously has rank one. Hence, Eq. (jlljl gives Pi = 1 
for each i. This reflects the fact that from the measurement of a single observable we 
cannot derive independent bounds on P parameters, but only a single combination of them 
is constrained. Therefore, the number of d.o.f. relevant for the calculation of the PG is 
given by P c = Yli=i Pi ~ P = N — P, which is exactly the number of d.o.f. relevant for the 
SG. Hence, we have shown that in the considered case the two methods are equivalent and 
consistent. 
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data set 


parameters 


N xLn/d-oi. SG 


reactor 
solar 

atmospheric 


A"i s 2 oi, #soi 
Am% ol ,0 soh r] s 


27 11.5/25 99% 
81 65.8/78 84% 
65 38.4/61 99% 



Table I: Parameter dependence, total number of data points, Xmm ana - the corresponding SG for 
the three data sets. 

C. Application to neutrino oscillation data 

In this section we use real data from neutrino oscillation experiments to discuss the ap- 
plication of the PG and to compare it to the SG. We consider the so-called (2+2) neutrino 
mass scheme, where a fourth (sterile) neutrino is introduced in addition to the three stan- 
dard model neutrinos. In general this model is characterized by 9 parameters: 3 neutrino 
mass-squared differences Am s 2 ol , Am^ tm , AmL SND and 6 mixing parameters 9 so \, 9 at m, #lsnd, 
dfj., rj s , r) e . The interested reader can find precise definitions of the parameters, applied ap- 
proximations, an extensive discussion of physics aspects, and references in Refs. SHIS. 
Here we are interested mainly in the statistical aspects of the analysis, and therefore we 
consider a simplified scenario. 

We do not include LSND, KARMEN and all the experiments sensitive to Aml SND and 
the corresponding mixing angle #lsnd- Hence, we are left with three data sets from solar, 
atmospheric and reactor neutrino experiments. The solar data set includes the current global 
solar neutrino data from the SNO, Super-Kamiokande, Gallium and Chlorine experiments, 
making a total of 81 data points, whereas the atmospheric data sample includes 65 data 
points from the Super-Kamiokande and MACRO experiments (for details of the solar and 
atmospheric analysis see Ref. |lol|). In the reactor data set we include only the data from 
the KamLAND and the CHOOZ experiments, leading to a total of 13 + 14 = 27 data 
points |HIE3|- In general the reactor experiments (especially CHOOZ) depend in addition 
to Amg 0l and 9 so \ also on Am^ tm and a further mixing parameter r) e . However^ we adopt here 
the approximation r\ e = 1, which is very well justified in the (2+2) scheme [3J]. This implies 
that the dependence on Am^ tm disappears and we are left with the parameters Ar?T,g 0l and 
# so i for both reactor experiments, KamLAND as well as CHOOZ. 

Under these approximations the experimental data sets we are using are described only 
by the 6 parameters Am^, Am^ tm , 9 ao \, 9 a t m , T] s , d^. The parameter structure is illustrated 
in Fig. ^ This simplified analysis serves well for discussing the statistical aspects of the 
problemia more general treatment including a detailed discussion of the physics is given in 
Refs. In Tab.E we summarize the parameter dependence, the number of data points, 

the minimum values of the x 2 -functions and the resulting SG. We observe that all the data 
sets analyzed alone give a very good fit. Let us remark that especially in the case of reactor 
and atmospheric data the SG is suspicious high. This may indicate that the errors have 
been estimated very conservatively. 

In Tab. ITTlwe show the results of an SG and PG analysis for various combinations of the 
three data sets. In the first three lines in the table only two out of the three data sets are 
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Figure 1: Parameter structure of the three data sets from reactor, solar and atmospheric neutrino 
experiments. 




Figure 2: PG for solar and atmospheric neutrino data in the (2+2) scheme. 



combined. By combining solar and atmospheric neutrino data we find a Xmm °f 126.7. With 
the quite large number of d.o.f. of 140 this gives an excellent SG of 78.3%. If however, the 
PG is applied we obtain a goodness-of-fit of only 3.54 x 10~ 6 . The reason for this very bad fit 
can be understood from Figs.^and|21 From Fig.^one finds that solar and atmospheric data 
are coupled by the parameter rj s . In Fig. |2]the A% 2 is shown for both sets as a function of 
this parameter. We find that there is indeed significant disagreement between the two data 
sets: 6 solar data prefers values of rj s close to zero, whereas atmospheric data prefers values 



The physical reason for this is that both data sets strongly disfavour oscillations into sterile neutrinos. 
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data sets 


N tot X t 2 ot,mm/ d -°- f - SG 


Er^r P xlJPc PG 


crtl —1— atm 

Ovjl \^ CXLlll 

react + sol 
react + atm 


14fi 126 7/1 4fl 78 3% 
108 77.4/105 98.0% 
92 49.9/86 99.9% 


q_i_4 01 K/i •? 54 x 10~ 6 
2+3 3 0.13/2 93.5% 
2+4 6 0.0/0 


KamL + sol + atm 
react + sol + atm 


159 132.7/153 88.1% 
173 138.2/167 95.0% 


2+3+4 6 21.7/3 7.53 x 10~ 5 
2+3+4 6 21.7/3 7.53 x 10~ 5 



Table II: Comparison of SG and PG for various combinations of the data sets from solar, atmo- 
spheric and reactor neutrino experiments. 

close to one. There are two reasons why this strong disagreement does not show up in the 
SG. First, since the SG of both data sets alone is very good, there is much room to "hide" 
some problems in the combined analysis. Second, because of the large number of data points 
many of them actually might not be sensitive to the parameter r] s , where the disagreement 
becomes manifest. Hence, the problem in the combined fit becomes diluted due to the large 
number of data points. We conclude that the PG is very sensitive to disagreement of the 
data sets, even in cases where the individual x 2 -minima are very low, and when the number 
of data points is large. 

In the reactor + solar analysis one finds complete agreement between the two data sets for 
the SG as well as for the PG. This reflects the fact that the determination of the parameters 
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6 so \ and Ar?7.g 0l from reactor and solar neutrino experiment are in excellent agreement 
Finally, in the case of the combined analysis of reactor and atmospheric data the PG cannot 
be applied. In our approximation these data sets have no parameter in common as one can 
see in Fig. ^ Hence, it makes no sense to test their compatibility, or even to combine them 
at all. 

In the lower part of Tab. |H]we show the results from combining all three data sets. By 
comparing these results with the one from the solar + atmospheric analysis one can appreci- 
ate the advantage of the PG. If we add only the 13 data points from KamLAND to the solar 
and atmospheric samples we observe that the SG improves from 78.3% to 88.1%, whereas if 
both reactor experiments are included we obtain an SG of 95.0%. This demonstrates that 
the SG strongly depends on the number of data points. Especially the 14 data points from 
CHOOZ contain nearly no relevant information, since the best fit values of Am^ and 6 so i are 
in the no-oscillation regime for CHOOZ implying that the x 2 is n£ d- Moreover, since reactor 
data are not sensitive to the parameter t] s (see Fig. [TJ the disagreement between solar and 
atmospheric data becomes even more diluted by the additional reactor data points. This 
clearly illustrates that the SG can be drastically improved by adding data which contains 
no information on the relevant parameters. Also the PG improves slightly by adding reactor 
data, reflecting the good agreement between solar and reactor data. However, the resulting 
PG still is very small due to the disagreement between solar and atmospheric data in the 
model under consideration. Moreover, the PG is completely unaffected by the addition of 



Since it is a generic prediction of the (2+2) scheme that the sterile neutrino must show up either in solar 
or in atmospheric neutrino oscillations the model is ruled out by the PG test [5j- 
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the CHOOZ data, because the x 2 °f CHOOZ is flat in the relevant parameter region, and 
the PG is sensitive only to the parameter dependence of the data sets. 

Finally, we mention that in view of the analyses shown in Tab. ITTlthe meaning of P c , the 
number of d.o.f. for PG becomes clear. It corresponds to the number of parameters coupling 
the data sets. Solar and atmospheric data are coupled only by r] s , hence P c = 1, whereas 
reactor and solar data are coupled by # so i and Arn^j and P c = 2. Atmospheric data has no 
parameter in common with reactor data, therefore P c = 0. In the combination of reactor + 
solar + atmospheric data sets the three parameters r] s ,9 so \, Am^ ol provide the coupling and 

Pc = S. 

D. General remarks on the PG 

(a) Using the relation Xmm = J2r^Xr(^) one can obtain more insight into the quality 
of the fit by considering the contribution of each data set to Xmin- ^ the is poor it 
is possible to identify the data sets leading to the problems in the fit by looking at the 
individual values of Axl(0). In this sense the PG is similar to the so-called "pull approach" 
discussed in Ref. [l^ in relation with solar neutrino analysis. 

(b) One should keep in mind that the PG is completely insensitive to the goodness-of- 
fit of the individual data sets. Because of the subtraction of the Xrmin m Eq. (JHJ) all the 
information on the quality of the fit of the data sets alone is lost. One may benefit from this 
property if the SG of the individual data sets is very good (see the example in Sec. IIV CJl . 
On the other hand, if e.g. one data set gives a bad fit on its own this will not show up in the 
PG. Only the compatibility of the data sets is tested, irrespective of their individual SGs. 

(c) The PG might be also useful if one is interested whether a data set consisting of 
very few data points is in agreement with a large data sample. 7 The SG of the combined 
analysis will be completely dominated by the large sample and the information contained in 
the small data sample may be drowned out by the large number of data points. In such a 
case the PG can give valuable information on the compatibility of the two sets, because it is 
not diluted by the number of data points in each set and it is sensitive only to the parameter 
dependence of the sets. 

V. CORRELATIONS DUE TO THEORETICAL UNCERTAINTIES 

One of the limitations of the PG is that it can be applied only if the data sets are 
statistically independent. In many physically interesting situations (for example, different 
solar neutrino experiments) this is not the case since theoretical uncertainties introduce 
correlations between the results of different - and otherwise independent - experiments. 
However, in such a case one can take advantage of the so-called pull approach, which, as 
demonstrated in Ref. [l^], is equivalent to the usual covariance method. In that paper it 



7 For example one could think of a combination of the 19 neutrino events from the Super Nova 1987A with 
the high statistics global solar neutrino data. 
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was shown that if correlations due to theoretical errors exist, it is possible to account for 
them by introducing new parameters £ a and adding penalty functions to the x 2 . In this way 
it is possible to get rid of unwanted correlations and the PG can be applied. The correlation 
parameters £ a should be treated in the same way as the parameters of the theoretical 
model. 

In this section we illustrate this procedure by considering a generic experiment with an 
uncertainty on the normalisation of the predicted number of events. Let the experiment 
observe some energy spectrum which is divided into N bins. The theoretical prediction for 
the bin i is denoted by /i«(0) depending on P parameters 0. In praxis often fii(0) is not 
known exactly. Let us consider the case of a fully correlated relative error a t h- A common 
method to treat such an error is to add statistical and theoretical errors in quadrature, 
leading to the correlation matrix 

SyiO) = ^< stat + ^(0)^.(0) , (44) 

where o"j )Sta t is the statistical error in the bin i. In the case of neutrino oscillation experiments 
such a correlated error results e.g. from an uncertainty of the initial flux normalisation or 
of the fiducial detector volume. The x 2 is given by 

JV 

^(0) = y E[u i -H(0)]Stf{0)[u j -n{0)], (45) 

where z/j are the observations. As shown in Ref. 0], instead of Eq. ()45j) we can equivalently 
use 

AO-tf^V-)'. («) 

and minimize with respect to the new parameter £. 

On the other hand, if £ is considered as an additional parameter, on the same footing as 
6, all the data points are formally uncorrelated and it is straight forward to apply the PG. 
Subtracting the minimum of the first term in Eq. (|46p with respect to and £ one obtains 

X 2 (0,O=Ax 2 (0,O+(^) 2 - (47) 

The external information on the parameter £ represented by the second term in Eq. (|47|) 
is considered as an additional data set. Evaluating the minimum of Eq. ()47|) for 1 d.o.f. 
is a convenient method to test if the best fit point of the model is in agreement with the 
constraint on the over-all normalisation. In particular one can identify whether a problem 
in the fit comes from the spectral shape (first term) or the total rate (second term). 

Moreover, one may like to divide the data into two parts, set I consisting of bins 1, . . . , n 
and and set II consisting of bins n, . . . ,N, and test whether these data sets are compatible. 
Eq. ()4f)jl can be written as 

M,q = ± (-^IbB) 2 + ± (^zfffiW V + (iziV , (48) 

Z_ 1 \ Olstat / . V CTi.stat / \ °"th / 

t=l x ' ' i=n x ' / \ / 



15 



and subtracting the minima of the two first terms gives the \ 2 relevant for the PG: 

X 2 (0, = Axl(0, + A** (0, + ( — ) 2 • (49) 

Assuming that the data sets I and II both depend on all P parameters 6 the minimum of 
this x 2 nas to be evaluated for P + 2 d.o.f. to obtain the PG. This procedure tests whether 
the data sets I and II are consistent with each other and the constraint on the over-all 
normalisation. By considering the relative contributions of the three terms in Eq. (|49|) it is 
possible to identify potential problems in the fit. For example one may test whether a bad fit 
is dominated only by a small subset of the data, e.g. a few bins at the low or high end of the 
spectrum. Alternatively, the two data sets I and II can come from two different experiments 
correlated by a common normalization error, e.g. two detectors observing events from the 
same beam. 

It is straight forward to apply the method sketched in this section also in more com- 
plicated situations. For example, if there are several sources of theoretical errors leading 
to more complicated correlations the pull approach can also be applied by introducing a 
parameter £ a for each theoretical error 0|. In a similar way one can treat the case when 
the compatibility of several experiments should be tested, which are correlated by common 
theoretical uncertainties. (Consider e.g., the various solar neutrino experiments, which are 
correlated due to the uncertainties on the solar neutrino flux predictions.) 



VI. CONCLUSIONS 

In this note we have discussed a goodness-of-fit method which was proposed in Ref. 0. 
The so-called parameter goodness-of-fit (PG) can be applied when the global data consists 
of several statistically independent subsets. Its construction and application are very similar 
to the standard goodness-of-fit. We gave a formal derivation of the probability distribution 
function of the proposed test statistic, based on standard theorems of statistics, and illus- 
trated the application of the PG on some examples. We have shown that in the simple case 
of two data sets determining the mean of a Gaussian, the PG is identical to the intuitive 
method of testing whether the difference of the two measurements is consistent with zero. 
Furthermore, we have compared the standard goodness-of-fit and the PG by using real data 
from neutrino oscillation experiments, which have been the original motivation for the PG. 
In addition we have illustrated that the so-called pull approach allows to apply the PG also 
in cases where the data sets are correlated due to theoretical uncertainties. 

The proposed method tests the compatibility of different data sets, and it gives sensible 
results even in cases where the errors are estimated very conservatively and/or the total 
number of data points is very large. In particular, it avoids the problem that a possible 
disagreement between data sets becomes diluted by data points which are insensitive to the 
problem in the fit. The PG can also be very useful when a set consisting of a rather small 
number of data points is combined with a very large data sample. 

To conclude, we believe that physicists should keep an open mind when choosing a sta- 
tistical method for analyzing experimental data. In many cases much more information can 
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be extracted from data if the optimal statistical tool is used. We think that the method dis- 
cussed in this note may be useful in several fields of physics, especially where global analyses 
of large amount of data are performed. 
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